326 research outputs found
DROW: Real-Time Deep Learning based Wheelchair Detection in 2D Range Data
We introduce the DROW detector, a deep learning based detector for 2D range
data. Laser scanners are lighting invariant, provide accurate range data, and
typically cover a large field of view, making them interesting sensors for
robotics applications. So far, research on detection in laser range data has
been dominated by hand-crafted features and boosted classifiers, potentially
losing performance due to suboptimal design choices. We propose a Convolutional
Neural Network (CNN) based detector for this task. We show how to effectively
apply CNNs for detection in 2D range data, and propose a depth preprocessing
step and voting scheme that significantly improve CNN performance. We
demonstrate our approach on wheelchairs and walkers, obtaining state of the art
detection results. Apart from the training data, none of our design choices
limits the detector to these two classes, though. We provide a ROS node for our
detector and release our dataset containing 464k laser scans, out of which 24k
were annotated.Comment: Lucas Beyer and Alexander Hermans contributed equall
Towards a Principled Integration of Multi-Camera Re-Identification and Tracking through Optimal Bayes Filters
With the rise of end-to-end learning through deep learning, person detectors
and re-identification (ReID) models have recently become very strong.
Multi-camera multi-target (MCMT) tracking has not fully gone through this
transformation yet. We intend to take another step in this direction by
presenting a theoretically principled way of integrating ReID with tracking
formulated as an optimal Bayes filter. This conveniently side-steps the need
for data-association and opens up a direct path from full images to the core of
the tracker. While the results are still sub-par, we believe that this new,
tight integration opens many interesting research opportunities and leads the
way towards full end-to-end tracking from raw pixels.Comment: First two authors have equal contribution. This is initial work into
a new direction, not a benchmark-beating method. v2 only adds
acknowledgements and fixes a typo in e-mai
Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design
Scaling laws have been recently employed to derive compute-optimal model size
(number of parameters) for a given compute duration. We advance and refine such
methods to infer compute-optimal model shapes, such as width and depth, and
successfully implement this in vision transformers. Our shape-optimized vision
transformer, SoViT, achieves results competitive with models that exceed twice
its size, despite being pre-trained with an equivalent amount of compute. For
example, SoViT-400m/14 achieves 90.3% fine-tuning accuracy on ILSRCV2012,
surpassing the much larger ViT-g/14 and approaching ViT-G/14 under identical
settings, with also less than half the inference cost. We conduct a thorough
evaluation across multiple tasks, such as image classification, captioning, VQA
and zero-shot transfer, demonstrating the effectiveness of our model across a
broad range of domains and identifying limitations. Overall, our findings
challenge the prevailing approach of blindly scaling up vision models and pave
a path for a more informed scaling.Comment: 10 pages, 7 figures, 9 tables. Version 2: Layout fixe
Industrial Infrastructure: Translocal Planning for Global Production in Ethiopia and Argentina
Current development and re-development of industrial areas cannot be adequately understood without taking into account the organisational structures and logistics of commodity production on a planetary scale. Global production networks contribute not only to the reconfiguration of urban spatial and economic structures in many places, but they also give rise to novel transnational actor constellations, thus reconfiguring planning processes. This article explores such constellations and their urban outcomes by investigating two current cases of industrial development linked with multilateral transport-infrastructure provisioning in Ethiopia and Argentina. In both cases, international partners are involved, in particular with stakeholders based in China playing significant roles. In Mekelle, Ethiopia, we focus on the establishment of a commodity hub through the implementation of new industry parks for global garment production and road and rail connections to international seaports. In the Rosario metropolitan area in Argentina, major cargo rail and port facilities are under development to expand the country’s most important ports for soybean export. By mapping the physical architectures of the industrial and infrastructure complexes and their urban contexts and tracing the translocal actor constellations involved in infrastructure provisioning and operation, we analyse the spatial impacts of the projects as well as the related implications for planning governance. The article contributes to emergent scholarship and theorisations of urban infrastructure and global production networks, as well as policy mobility and the transnational constitution of planning knowledge and practices
- …